Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix registry unsupported pipeline update #96497

Conversation

afoucret
Copy link
Contributor

@afoucret afoucret commented Jun 1, 2023

Closes: #95766

The update tests were failing because pipeline are using new processor config that have been introduced recently.

  • behavioral analytics pipeline is using the ignore_missing param of the uri_parts processor (introduced in 8.8.0)
  • log pipeline is using ignore_missing_pipeline param of the pipeline processor (introduced in 8.9.0)

Both registry are now installed only when all the modes in the cluster are updated to the right version.

@afoucret afoucret requested a review from jimczi June 1, 2023 13:53
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.9.0 labels Jun 1, 2023
@afoucret afoucret requested review from eyalkoren and davidkyle June 1, 2023 13:53
// Test that stats are serializable and can be gathered
getTrainedModelStats();
}
// Test that stats are serializable and can be gathered
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle Reverted your workaround.

for (int i = 0; i < 10; i++) {
assertInfer(modelId);
}
waitForDeploymentStarted(modelId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle Reverted your workaround.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afoucret. I added the test-full-bwc label for extra CI coverage. The label means the upgrade tests will be run against all backwards compatible versions

@@ -253,4 +258,13 @@ protected boolean requiresMasterNode() {
// there and the ActionNotFoundTransportException errors are then prevented.
return true;
}

@Override
protected boolean isClusterReady(ClusterChangedEvent event) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyalkoren Now checking all nodes are at least at v 8.9.0, so it does not fails because of the ignore_missing_pipeline parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you preventing with this?
AFAIK, once the local implementation of requiresMasterNode() returns true, as in this case, this ensures that the upgrade will occur only on the elected master. I believe that during rolling upgrades this ensures that this happens only after all non-master nodes are already upgraded.
Since the usage of ignore_missing_pipeline was introduced in #95971, which was added to 8.9.0 and not back-ported, I am not sure whether this is required.

BTW, AnalyticsTemplateRegistry also requires master node, so double-check if this is required in the original case as well.

@jbaiera @dakrone please confirm or enlighten me if I got it wrong

Copy link
Member

@davidkyle davidkyle Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that the non-master nodes will be upgraded first during a rolling upgrade. This is a sensible precaution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would nodes apply settings coming from a master of a higher version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dakrone just explained that I indeed got it wrong and there is no actual enforced guarantee as to the order of upgrade in code. This should normally not occur if the upgrade is done according to documentation (master node last), but such verification does make sense

@afoucret afoucret added >test-failure Triaged test failures from CI :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team and removed needs:triage Requires assignment of a team area label labels Jun 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

@davidkyle davidkyle added the test-full-bwc Trigger full BWC version matrix tests label Jun 5, 2023
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -253,4 +258,13 @@ protected boolean requiresMasterNode() {
// there and the ActionNotFoundTransportException errors are then prevented.
return true;
}

@Override
protected boolean isClusterReady(ClusterChangedEvent event) {
Copy link
Member

@davidkyle davidkyle Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that the non-master nodes will be upgraded first during a rolling upgrade. This is a sensible precaution.

for (int i = 0; i < 10; i++) {
assertInfer(modelId);
}
waitForDeploymentStarted(modelId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afoucret. I added the test-full-bwc label for extra CI coverage. The label means the upgrade tests will be run against all backwards compatible versions

@afoucret
Copy link
Contributor Author

afoucret commented Jun 5, 2023

Thank you @davidkyle and @eyalkoren for your careful reviews!

@afoucret afoucret merged commit 94ea505 into elastic:main Jun 5, 2023
@dakrone
Copy link
Member

dakrone commented Jun 5, 2023

I had this on my list to look at today (just found out about it this morning). Next time since the Data Management team owns the IndexTemplateRegistry and StackTemplateRegistry, can please you apply that label to the PR as well so we get notified? I think passing the entire ClusterChangedEvent is overkill here, when for 99% (100%?) of the cases we could instead have an overrideable method called getRequiredNodeVersion() on the registry which the IndexTemplateRegistry could check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team >test-failure Triaged test failures from CI test-full-bwc Trigger full BWC version matrix tests v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

behavioral_analytics-events-final_pipeline parsing fails in a mixed cluster
5 participants